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Abstract 



In this work we consider the state estimation problem in nonhnear/non-Gaussian 
systems. We introduce a framework, called the scaled unscented transform Gaussian 

'^ I sum filter (SUT-GSF), which combines two ideas: the scaled unscented Kalman fil- 

^ • ter (SUKF) based on the concept of scaled unscented transform (SUT) [23], and the 

Gaussian mixture model (GMM). The SUT is used to approximate the mean and 
^1 covariance of a Gaussian random variable which is transformed by a nonlinear func- 

r~| ' tion, while the GMM is adopted to approximate the probability density function 

^ Qh . (pdf) of a random variable through a set of Gaussian distributions. With these two 

tools, a framework can be set up to assimilate nonlinear systems in a recursive way. 
Within this framework, one can treat a nonlinear stochastic system as a mixture 
model of a set of sub-systems, each of which takes the form of a nonlinear system 

\^ , driven by a known Gaussian random process. Then, for each sub-system, one applies 

^^ I the SUKF to estimate the mean and covariance of the underlying Gaussian random 

variable transformed by the nonlinear governing equations of the sub-system. In- 
corporating the estimations of the sub-systems into the GMM gives an explicit 

^3 ! (approximate) form of the pdf, which can be regarded as a "complete" solution to 

the state estimation problem, as all of the statistical information of interest can be 
obtained from the explicit form of the pdf [5]. 

S^ ' In applications, a potential problem of a Gaussian sum filter is that the number 

of Gaussian distributions may increase very rapidly. To this end, we also propose an 
auxiliary algorithm to conduct pdf re-approximation so that the number of Gaussian 
distributions can be reduced. With the auxiliary algorithm, in principle the SUT- 
GSF can achieve almost the same computational speed as the SUKF if the SUT-GSF 
is implemented in parallel. 

As an example, we will use the SUT-GSF to assimilate a 40-dimensional system 
due to Lorenz and Emanuel [26]. We will present the details in implementing the 
SUT-GSF and examine the effects of filter parameters on the performance of the 
SUT-GSF. 

Key words: Data Assimilation, Ensemble Kalman Filter, Scaled Unscented 
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1 Introduction 



Data assimilation practices are often confronted by the following problems: 
(a) The systems being assimilated are nonlinear (nonlinearity) ; (b) The prob- 
ability distributions of the systems in assimilation are non-Gaussian (non- 
Gaussianity); (c) The computational cost is expensive due to high dimensions 
of the systems in assimilation (computational cost/speed). 

The ensemble Kalman filter (EnKF) [11,12,13] is a data assimilation method 
which attempts to tackle some of the above problems. Essentially, the EnKF is 
a Monte Carlo implementation of the Kalman filter (KF) [8], with an ensem- 
ble of system states as the representation of the state space of a dynamical 
system. With a typically small ensemble size, the EnKF can run cheaply. 
Moreover, by propagating an ensemble of system states forward through the 
governing equations of a dynamical system and evaluating the statistics (e.g. 
sample mean and covariance) of system states based on the propagated en- 
semble, the EnKF can also "bypass" the problem of nonlinearity in the sense 
that it does not require to linearize a nonlinear system as does the extended 
Kalman filter. However, the problem of non-Gaussianity is not fully addressed 
in the EnKF. Instead, it is usual to (implicitly) assume that, both the dynam- 
ical and observation noise, and system states (approximately) follow some 
Gaussian distributions. In practice, this assumption may not always be true. 
However, possibly because of its simplicity in implementation and the ability 
to achieve reasonable accuracy with relatively low computational cost in many 
situations, the EnKF remains to be a popular method in the community of 
data assimilation. 

In recent years, two different types of filters, namely the Gaussian sum filter 
(GSF) [1,33] and the particle filter (PF) [5,16], have attracted attention in 
the community of data assimilation for their abilities to tackle nonlinear/non- 
Gaussian data assimilation problems. For example, in [4,7,30], the authors 
adopted the GSF that consisted of a set of EnKFs, while in [34] the authors 
proposed to use the PF for data assimilation. A simplified hybrid of the PF 
and the EnKF suitable for high dimensional systems was also developed in 
[18]. 
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In terms of computational efficiency, the PF needs to generate large samples 
for approximation. In some circumstances, in order to avoid some numerical 
problems (e.g., weights collapse [6]), the number of samples needs to scale 
exponentially with the dimension of the system in assimilation, which may 
be infeasible for high dimensional systems [31]. Hence, in this paper we will 
confine ourselves to the framework of the GSF, with an attempt to tackle all 
the problems listed at the beginning. 

The framework to be introduced later is similar to those in the existing works 
[4,7,30] on the GSF. Here we use the reduced rank scaled unscented Kalman 
ffiter (SUKF) [23,28], based on the concept of scaled unscented transform 
(SUT) [23], to construct the GSF, which will thus be called the scaled un- 
scented transform Gaussian sum filter (SUT-GSF). 

The major differences between our method and the existing works are two 
fold. 

Firstly, we use the reduced rank SUKF to construct the GSF. The SUKF 
is a nonlinear Kalman filter designed to assimilate nonlinear/Gaussian sys- 
tems. Similar to the EnKF, the SUKF also generates some samples of system 
states for the purpose of approximation. However, the samples in the SUKF 
are produced in a deterministic way, and have some special properties (e.g. 
symmetry and moment catching). In [28] we show analytically that, under 
the Gaussianity assumption, the reduced rank unscented Kalman ffiter (one 
particular case of the reduced rank SUKF) can avoid some sample errors and 
bias that appear in the EnKF due to the effect of finite ensemble size. More 
details of the reduced rank SUKF will be given in § 3.2. 

Secondly, in [4,7,30], the authors assumed that in the assimilated system, 
both the dynamical and observation noise follow Gaussian distributions. If 
this assumption is violated, one may also need to use a Gaussian mixture 
model (GMM) to approximate the statistical distribution(s) of the dynam- 
ical and/or observation noise. In such circumstances, it can be shown that 
the size of the GMM will grow very rapidly with time. Maintaining such a 
large size GMM will make the computation become eventually prohibitive. In 
contrast, here we will consider the case that both the statistical distributions 
of the dynamical and observation noise are expressed (or approximated) in 
terms of some GMMs. We will introduce an auxiliary algorithm to tackle the 
problem of size growth of the GMM. We will show that, with the auxiliary 
algorithm, if implemented in parallel, the SUT-GSF can achieve almost the 
same computational speed as the reduced rank SUKF. 

The remainder of this paper is organized as follows. In § 2 we introduce re- 
cursive Bayesian estimation (RBE) as the uniform (conceptual) framework to 
solve the state estimation problem in various scenarios. To this end, in § 3 we 



first consider the state estimation problem in nonlinear/ Gaussian systems. We 
present the reduce rank scaled unscented Kalman filter (SUKF) as an approxi- 
mate solution to the state estimation problem in high dimensional systems. In 
§ 4, we then proceed to consider the state estimation problem in high dimen- 
sional nonlinear/non-Gaussian systems. Based on the framework of RBE, we 
derive the scaled unscented transform Gaussian sum filter (SUT-GSF) as an 
approximate solution, which consists of a set of parallel reduced rank SUKFs. 
To reduce the (potential) computational cost in some situations, in § 5 we pro- 
pose an auxiliary algorithm to conduct pdf re-approximation. For convenience, 
in § 6 we outline the major procedures in the SUT-GSF equipped with the 
auxiliary algorithm. An example is then given in § 7 to illustrate the details in 
implementing the SUT-GSF, and to examine the effects of filter parameters 
on the performance of the SUT-GSF. Finally we conclude this paper in § 8. 



State estimation problem in nonlinear/non-Gaussian systems and 
the conceptual solution 



We consider the state estimation problem in the following scenario: 

Xfc = Mk,k^l (xA:_i) + Ufc , (la) 

Yk = Hk (xfc) + Vfc , (lb) 

where the transition operator A4.k,k-i and the observation operator l-Lk are 
both possibly nonlinear. The dynamical and observation noise, in terms of u^ 
and Vfc respectively, are non-Gaussian, but their pdfs, p(ufc) and p(vfc), are 
assumed to be known to us. 

The problem of interest is to estimate the system state x^ at time fc, given the 
historical observations Y^ = {yk^Yk-i, ■ ■ ■ } up to and including time /c, and 
the prior pdf p (xi|Yj_i) of the system state Xj at some instant i {i < k). 

Recursive Bayesian estimation [5] provides a framework that recursively solves 
the above problem in terms of some conditional pdfs. Let p{'Xk\Yk-i) be the 
prior pdf of the state Xfc conditioned on the observations Yfc_i. Once the new 
observation y^ is available, one updates the prior pdf to the posteriory (x^jY^) 
according to Bayes' rule. Then by evolving the state x^ forward through the 
system model Eq. (la), one computes the prior pdf p(xfc+i|Yfc) at the next 
time instant. Concretely, one may formulate the mathematical description of 
the aforementioned idea as follows: 

p(xfc|Yfc„i) = /p(xfc|xfc_i)p(xfe_i|Yi.„i)rfxfc_i, (2a) 

/ IV ^ P(yfc|xfc)p(xfc|Yfc^i) 

pxfcYfc =— - — ■ — -- — — — — , 2b 

J P (yfc Xfe) p (xfc Yfc_i) dxfc 



where p(xfc|xfe_i) is equal to the value of p (u^) evaluated at u^ = x^ — 
A^fc-i,fe (xfc-i) (by Eq. (la)) and conditioned on Xfc_i, and p(yfc|xfc) is equal 
to the value of p (v^) evaluated at v^ = y^ — "H^ (x^) (by Eq. (lb)) and 
conditioned on x^. Once the conditional pdfs in Eq. (2) have been obtained, 
all of the statistical information of interest, for example, the conditional means, 
can be evaluated based on the explicit forms of the pdfs. 

Note that Eq. (2) only provides a conceptual framework for pdf estimations. 
In many situations, the integrals in Eq. (2) are intractable. Thus one may have 
to resort to some approximation method to solve Eq. (2), as will be shown 
later. 



Reduced rank scaled unscented Kalman filter as an approximate 
solution to state estimation problem in high dimensional nonlin- 
ear/Gaussian systems 



In this section we confine ourselves to the state estimation problem in non- 
linear/Gaussian systems. Here by "nonlinear /Gaussian", we mean that in 
Eq. (1), not only are the dynamical and observation noise, u^ and v^ re- 
spectively, Gaussian, but also the system state x^. 

The contents to be introduced below are necessary for deriving the scaled 
unscented transform Gaussian sum filter (SUT-GSF) in the next section. We 
will first review the concept of the scaled unscented transform (SUT) [23], and 
propose a reduced rank version of the scaled unscented Kalman filter (SUKF) 
following [23,28], with an attempt to reduce the computational cost of the 
SUKF in high dimensional systems. 



3.1 Scaled unscented transform 



The scaled unscented transform (SUT) is designed to approximately solve 
the following estimation problem: Given an m-dimensional Gaussian random 
variable x with mean x and covariance P^;, we conduct a nonlinear transform 
on X to obtain a new random variable y = f (x) ^ , where f is a nonlinear 
transform function with suitable smoothness. We are interested in estimating 
the mean and covariance of the transformed random variable y. 



^ A more general scenario is to consider the system y = f (x, u), where x represents 
system states and u the perturbations, which are assumed to be independent of each 
other, and follow some Gaussian distributions. However, one can always convert 
the above form into the simpler one y = f (z) by introducing the joint state z = 
[x-^,u-^] , where T means transpose operation. 



As a solution to the above problem, the SUT first generates a set of 2L + 1 
specially chosen system states, called sigma points, according to the following 
formula: 



Xq = x, 

A'i = X + a^/L + \ U^^ , z = 1, 2, ■ ■ ■ , L, 

A'i = X - a^/L + A (7p^) ,i = L + l,L + 2,--- ,2L, 



(3) 



where a is a scale factor, and ( a/P^J denotes the i-ih. column of an square root 

matrix -\/P^ of Pa;. When a = 1, the SUT reverts to the unscented transform 
(UT) [24], a special case of the SUT. In Eq. (3) A is an adjustable parameter, 
which is introduced as an extra freedom to tune the higher order moments 
of the set of sigma points {Xi]^^^ [24]. For a Gaussian random variable x, it 
can be shown that A = 3/a^ — L is an optimal choice in the sense that the 



higher order moments of the finite set {'^jjj^o Provides a best match of the 
Gaussian distribution of x [24]. For an m-dimensional random variable x, it is 
customary to require L > mto avoid rank deficiency in the covariance matrix 
of sigma points [24] . 

Furthermore, a set of weights {Wj}j^q, 






(4) 



-5 -"5 



is allocated to the above sigma points. It can be shown that, the weighted 
sample mean X and sample covariance Px of the finite set {A'jJ^^q match the 



mean x and covariance P^; of x, i.e.. 



2L 



x = J2w,x, = ^, 



The above identities hold independent of the choice of parameters a, A and 
L. Later in § 5 this feature will be employed to implement a strategy of pdf 
re-approximation. 

To estimate the mean and covariance of the transformed random variable 
y = f (x), we first denote the set of transformed (or propagated) sigma points 
by 3^ = {3^i : 3^i = f (^j)}j=o' then the mean and covariance of y are estimated 



by 

2L 

y = Y.^iy^^ (6a) 

i=0 
2L 

P. = E ^^ 0^^ - y) O^-^ - y)^ + (i + /3 - «') (3^0 - y) (3^0 - y)^ , (6b) 

i=0 

where the second term on the right hand side (rhs) of Eq. (6b) is introduced 
to reduce the approximation error further [22]. In the case that x follows a 
Gaussian distribution, it is suggested to choose (3 = 2 [22]. 



3.2 Reduced rank scaled unscented Kalman filter 



Here we assume that the dynamical noise u^ and the observation noise v^ 
in Eq. (1) follow zero mean Gaussian distributions, with covariance R^ for 
Ufc, and Qfc for v^. For simplicity, we further assume that u^ and v^ are 
uncorrelated white noise ^ . 

All nonlinear Kalman filters, such as the extended Kalman filter (EKF) [2], 
the ensemble Kalman filter (EnKF) [11] and the SUKF [23], can be deemed 
different methods that approximately solve the integral equations Eq. (2) of 
RBE in nonlinear/Gaussian systems. Nominally, they use the same formula 
at the filtering step Eq. (2b) to update a background to the analysis. Note 
that under the assumption of Gaussianity, to estimate the pdf of a Gaussian 
distribution, it is sufficient to estimate its mean and covariance. Thus for a 
nonlinear Kalman filter, the pdf approximation problem at the propagation 
(or prediction) step Eq. (2a) can be recast as the estimation problem stated 
as the beginning of § 3.1. Roughly speaking, it is the approach to solving the 
recast problem in § 3.1 that makes various nonlinear Kalman filters different 
from each other. 

As has been explained in § 3.1, the idea behind the scaled unscented Kalman 
filter [23] is to adopt the SUT to solve the recast problem. For computational 
efficiency, here we introduce a reduced rank version of the SUKF based on 

[21,28]. 

Without loss of generality, we assume that at time fc — 1, one has obtained an 
m-dimensional analysis sample mean x^_^ and an m x li^_i square root 



Qtxa 



fe-1 



CA,--i,ieA;-i,i, ■ ■ • , ak-ii^,_j^Gk^i^i^,_^ 



(7) 



^ The filter forms in the cases that u^ and v^ are correlated and/or colored noise 
can be derived in a similar way. We refer the readers to, e.g., [2, Ch. 11], for the 
details. 



of the error covariance P^.i- Here (Jk-i,i and e^-i^j (i = 1, ■ ■ ■ ,lk-i) are the 
leading /fc_i eigenvalues and corresponding eigenvectors of P^_i, which can 
be obtained through a fast singular value decomposition (SVD) algorithm, 
for example, the Lanczos or block Lanczos algorithm [10,15] (especially for 
a sparse matrix). Note that for our purpose, we only need to compute the 
first Ik-i leading pairs of eigenvalues and eigenvectors, rather than the whole 
spectrum. This strategy may help reduce the computational cost in high di- 
mensional systems. To see this, we use a simple scenario for illustration, where 
the m- dimensional dynamical system is given by x^+i = Ax^. A is taken to 
be a full rank matrix (otherwise the model size can be reduced). Then the 
computational complexity of propagating one sigma point forward is 0{m^). 
Therefore for the full rank SUKF (i.e. /^-i > ^, the computational com- 
plexity of propagating all 2lk-i + 1 sigma points forward is at least 0{m^). 
In contrast, for the reduced rank SUKF, by using the Lanczos algorithm or 
its variants to compute the eigenvalues and eigenvectors, the computational 
complexity of one iteration is at most 0{im?') [10, p. 35], or even less for a 
sparse matrix. Thus to evaluate the first Ik-i pairs of the eigenvalues and 
eigenvectors, the computational complexity is Ik-i x n** x 0{im?), where n** 
is the average number of iterations in finding a pair of eigenvalue and corre- 
sponding eigenvector through the Lanczos algorithm, and the computational 
complexity of evolving 21^-1 + 1 sigma points forward is {21^-1 + 1) x 0{im?). 
Therefore, for the reduced rank SUKF, the overall computational complexity 
of generating sigma points and propagating them forward is approximately 
[/fc_i X (fi** -|-2) + 1] X 0{m?'). This can be much less than 0{m?) in some large 
scale systems, such as a weather forecasting model with several million state 
variables, while the sizes of Ik-i and n** can be chosen in the orders of 10^ and 
10"^, respectively, or even less (for example, see [37]). 



'^k-i,i ( • '^^^ t)e generated, 
in the spirit of Eq. (3), as follows: 



■ya _ -a 

Xk-i,i = ^k-1 + Oi (4-1 + A) ak^i,iek-i,i, i = 1, 2, ■ ■ ■ , 4_i, 

1 /9 

Xl^-i i = x^_i - « (^'-i + A) (Xk-i^iBk-i^i, , i = k-i + 1, /fc-i + 2, ■ ■ ■ , 24-1- 



For convenience of discussion, we say that the above sigma points are generated 
with respect to the "quartet" (a, A,x^_^, S%°_A. According to Eq. (4), we also 
specify a set of weights associated with the above sigma points 



a^(4_i + A) a^ 



1 
2a2(d + A)'""^'"' 



(9) 



Wk-i,i = TT^rn — TT. * = 1, 2, ■ ■ ■ , 2/fc_i. 



After generating sigma points at fc — 1, one propagates them forward through 
the system model Eq. (la). Here we let the ensemble of forecasts of the prop- 
agations be denoted by 



X^ = {xt, : x^, = Mk,k+i (-^fc-i,) , ^ = 0, ■ ■ ■ , 2/fc-i} , (10) 



which can be considered as an analogy to the background ensemble in the 
framework of the EnKF. The ensemble mean x^ and covariance P^ will be 
estimated in accordance with the SUT in § 3.1. In what follows, we split the 
procedures of the reduced rank SUKF into the propagation and filtering steps. 



3.2.1 Propagation step 

At the propagation step, the ensemble mean x^ and covariance P^ are evalu- 
ated in the spirit of Eq. (6) such that 



±l=J2 W-.-Mxt,, (11a) 

2/fc-i 

pt = E w,-,, U, - xt) U, - ^lY (lib) 



i=0 



T 



+ (l + /3 - «2) (x^ - xt) (xt,o - xt) + Qfc 



Note that, in the presence of the dynamical noise term u^ in Eq. (la), there 
is an alternative way to represent its effect, that is, in Eq. (10) one adds a 
noise term u^ ^ to Mk,k+i i^k-i,i) so that x^ ^ becomes Mk,k+i f'^fcLi^J +^1,1 
(i = 0, ■ ■ ■ ,24_i), where u^^ is a sample of the dynamical noise u^. Corre- 
spondingly, the covariance matrix Q^ in Eq. (lib) should be removed. In this 
work, we do not attempt to compare these two different ways in representing 
the effect of dynamical noise. Since this section mainly serves to provide an 
analytic result for later use in introducing the SUT-GSF, we keep using the 
current forms Eq. (10) and (lib). 

To compute the Kalman gain K^, it is customary to first compute the cross 
covariance P'^ and the projection covariance Pf'^ [22,24], which is also carried 



out in the spirit of Eq. (6) such that 

2L 

yk=Y.Wk-i,nk{^l^, (12a) 

i=0 
2L 

PZ = E W,-i, (xt, - xt) {n, (xtj - y,.)"^ (12b) 

i=0 

+ (l + /3 - a^) (xt,o - xt) (?/, (xt,o) - y.)"" , 

2L ^ 

Pf = E W^^-i. (^fc (4.) - y;^) (^fc (xt) - yfe) (12c) 

+ (l + /3 - «2) (h. (xt,o) - y.) (n, (xt,o) - y^ 



For numerical reasons, it is often desirable to re-write the above covariances 
in terms of square root matrices. To this end, we introduce S^ and Sj!, which 
are defined as 



S^ 



Sh 
k 



W^fe"-1,0 (4,0 - X^.) , \/W^ (xt,i - X^) , ■ ■ ■ , v/W^fc-1,2/.-. (xl2/,_, - Xt 

(13a) 
W^, {Uk (x^,o) - yfc) , v/w^ (^fc (xt J - y.) , (13b) 



where W^_^ q = W4_i,o + 1 + /? — a^. Then the above covariances read 

P^ = S^(SD^ + Qfc, (14a) 

Pu=^l{^lf . (14b) 



Pr = sUs^)^ (14c) 



Accordingly, the Kalman gain K^ can be calculated in terms of the above 
square roots as 



/ ^ \ —1 / \T / / \T 

cr I TiPi" I "D \ G^ ( O^ 1 I O^ / O^ 



-1 



K. = Pr Pf + K^ =SUSt SMS^J +R, . (15) 



3.2.2 Filtering step 

When a new observation is available, we update the sample mean and covari- 
ance as follows: 



±t = ±l + KJyk-'Hj±l)), (16a 



Pt = Pl-K,{p-f. (16b) 



10 



.21. 



A'^j > with re- 
spect to (a, A, x^, S^°) according to Eq. (8), where S^'" is the square root of P^ 
in a form analog to Eq. (7), and calculate the associated weights {W^.ijjJ'o ^^" 

'^ki f . 

forward to start the next assimilation cycle. 

More concretely, to obtain such a square root matrix 

Sr = K,iefc,i, ■ ■ ■ , crk,h^kh] , (17) 

where crfc,j's and e^^j's (i = 1, ■ ■ ■ , /fc) are the first Ik leading pairs of eigenvalues 
and eigenvectors of P^, we just need to specify the value of Ik after we choose a 
certain SVD algorithm. For convenience, hereafter we will call Ik the truncation 
number at time k. In this paper, we use the following rule [28] 



^k,i > trace [Plj /Efc , z = 1, ■ ■ ■ , 4 , 
al, < trace (P^) /Efc , z > /^ + 1 , 

to determine the value of Ik, where trace (Pfc) means the trace of the matrix 

P^, and Tk is a threshold. To preventing Ik getting too small or too large, we 
also pre-specify the lower and upper bounds, denoted by k and lu respectively, 
of Ik to guarantee that k < h < lu- The implementation of the rule Eq. (18) 
will be discussed with more details in 5 7.2. 



Scaled unscented transform Gaussian sum filter as an approx- 
imate solution to the state estimation problem in high dimen- 
sional nonlinear/non-Gaussian systems 



In this section we proceed to consider the state estimation problem in high 
dimensional nonlinear /non- Gaussian systems. Based on the idea of Gaussian 
sum approximation, we will show that one can approximately solve the prob- 
lem through the Gaussian sum filter (GSF), which consists of a set of parallel 
reduced rank SUKF, and which we term the scaled unscented transform Gaus- 
sian sum filter (SUT-GSF). 

In § 2 we remarked that Eqs. (2b) and (2a) give only a conceptual solution 
to the state estimation problem in nonlinear/non-Gaussian systems, in the 
sense that the integral equations in Eq. (2) are often intractable. To overcome 
this difficulty, one idea is to approximate the conditional pdfs on the right 
hand side of Eq. (2) through a set of Gaussian pdfs. This is often known as 
the Gaussian sum approximation, or Gaussian mixture model (GMM) in the 



11 



literature [2, ch. 8]. In this way, state estimation in a nonlinear/non-Gaussian 
system can be approximately recast as state estimation in a set of parallel 
nonlinear/Gaussian systems. Therefore the reduced rank SUKF introduced in 
the preceding section can be applied to each individual nonlinear/ Gaussian 
system to evaluate the mean and covariance of the corresponding Gaussian 
pdf. 

More concretely, suppose that the pdfs p (u^) and p (v^) of the dynamical 
and observation noise at time k can be approximated by n^ and n^ Gaussian 
distributions respectively, such that 



pK,)^E«M^(ufc:0,Qfc,) , (19a) 

pK)^ga^,,iVK:0,Rfc,) , (19b) 



j=i 



where N ['x: fi, S) means that the pdf of a random variable x follows a Gaus- 
sian distribution with mean /i and covariance S, a^ ^ G [0, 1] is the weight asso- 

dated with A^ (u^ : 0, Qk,i), which satisfies that a^^ G [0, 1] and J2i=i (^ti — ^■ 
The weights a^j's are defined similarly. 

Moreover, let the prior pdf of the initial condition xq of system states be 
p (xo) = p (xo|Y_i) (Y_i can be treated as an empty set if no observation is 
available before the assimilation starts), which can again be approximated by 
a set of ng^ Gaussian distributions, so that 



n: 



xb 



"0 

p [Xo) ^ X. 70, 



-^^'^•iV(xo:x^„Py , (20) 



where 7o,j G [0, 1] and J2i=i 1o,i = 1- Note that if Pg.j -)> for i = 1, ■ ■ ■ , ng^, 
then each Gaussian distributions N (xq : Xq.jjPo,*) approaches a Dirac delta 
function with the point mass at Xq j. Thus Eq. (20) is equivalent to conducting 
a Monte Carlo approximation, with the samples being Xq ^, z = 1, ■ ■ ■ , ng''. 

By applying Eqs. (2b) and (2a), one can recursively compute the prior and 
posterior pdfs of the state x^, p (x^jY^.i) and p (x^jY^) respectively. Like the 
reduced rank SUKF, we also split the procedures in the SUT-GSF into two 
steps: propagation and filtering. 



12 



4-1 Propagation step 

Without lost of generality, we assume that at time k — 1, we have the pos- 
terior pdf p (xfc_i|Yfc„i), which is approximated in terms of nf.°i^ Gaussian 
distributions such that 

fc — 1 

p (x,_i|Y,„i) ^ Y. Pk-i,N (x,_i : xL,,, P^i,) , (21) 

where /3fe-i,j G [0, 1] and Y,i=i^ Pk-i,i = 1- Moreover, by Eqs. (la) and (19a) it 
is clear that 

K 
p (xfc|xfc_i) ^ J2 ^l,i^ i^k ■■ Mk-i,k (xfc_i) , Qfe,i) . (22) 

Then, according to Eq. (2a), the prior pdf p (xfc|Yfc_i) is given by 
p(xfc|Yfc„i) = / j9(xfc|xfc„i)p(xfc_i|Yfc_i)rfxfc_i 



i=i j=i 



(23) 



where 



hj (xfc) = I AT (xfc : Mk-i,k (xfc_i) , Qfc_i,,) AT (xfc_i : x^„, ., P^_, .) t/x^.i. 

(24) 
The evaluation of lij (x^) can be treated as a nonlinear/ Gaussian estimation 
problem discussed in § 3, therefore the SUT can be applied to approximate 
Ijj (xfc) as a Gaussian distribution A^ (x^ : ^k,(i,j)^'^k,(i,j))^ where ^iuj\ and 
P^ (■ ■^ are the mean and covariance of the background evaluated by propa- 
gating forward the analysis at instant k — 1, with mean x^_i j and covariance 
P^_]^ j, through the following nonlinear/ Gaussian system: 



Xfc+l — Aik,k+1 (Xfc) + Ufcj , 

p (ufcj) = A^ (ufcj : 0, Qfcj) . 
Therefore, as an approximation we can re- write p(xfc|Yfc„i) as 

„xa u 

"fe-1 "fe 

lb 



(25) 



p(xfe|Yfe_i) ^ ^ ^ a^^/fc_i,i A^ (xfc : x^^(i^j.), Pfc^(i^j.) 
= ^7MA^(x.:xt,P^ 



(26) 



k,s j 1 
s=l 
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where nf' = nlt-^nl, '-)k,s = Cik,jf^k-i,i with the integer index s being a one- 
dimensional representation of the index {i,j), e.g., s = i + n%°i^{j — 1), 1 < 
i < n%'^^ and 1 < j < n^ . 



4-2 Filtering step 

After the observation y^ is available, one can update the prior pdf p (xfc|Yfc_i) 
to the posterior p(xfc|Yfc), according to Bayes' rule Eq. (2b). Also note that, 
by Eqs. (lb) and (19b), it is clear that 

P iyk\^k) ^ E <,N (y, : Hk (xfc) , Rfc,) . (27) 

Substituting Eqs. (26) and (27) into Eq. (2b), we get 
p (xfelYfc) oc p (yfclxfc) p (xfel Yfe^i) 



i=i j=i 



(2J 



E E 7;i.«fc,,^^ (y^ : ^fc (x^J , Pm + ^^^i) J^,.- (x*^) 



j=l i=l 



where in the first line of Eq. (28), "oc" means "proportional to" (by discarding 
the constant /p (y^jx^) p (xfc|Yfc_i) (ix^ in Eq. (2b)), Pf^j in the third line of 
Eq. (28) is the projection covariance of the Gaussian random variable with 
mean x^ ^ and covariance P^ j, which can be computed in the context of the 
reduced rank SUKF as introduced in § 3, and 

^ _ iV (xt : xt,, P|,.) N (yt : H„ (xt) . Kt,,) 



Eq. (29) can be interpreted as follows. One has the prior pdf A^ (x^ : x^ j, P^^ 
of Xfc. A new observation y^ is obtained through the following observer 

Yk = Hk (xfc) + Vfcj , 

p (vfc J = A^ (vfcj : 0, Rfcj) . ^ ' 

According to Bayes' rule, Jjj (x^) is then the posterior pdf of x^ with the 
observation y^ made by the observer Eq. (30). 

It can be shown that if the observation system Eq. (lb) is linear Gaussian, 
then Jjj (xfc) is also Gaussian, with mean and covariance of the analysis up- 



14 



dated from the mean and covariance of the background based on the ordi- 
nary Kalman filter (see, for example, [20]). However, if Eq. (lb) is a non- 
linear/Gaussian system, we follow the reduced rank SUKF to approximate 
Jjj (xfc) by a Gaussian pdf N fx^ : x^ (j^), P^ (jjJ, with mean Xfc^(jj) and co- 
variance PkXij) gi'^^n by 

^k,(i,j) = K,i + Kfc,(ij) (yk - Hk (xfc,i)) , (31a) 

where 

Kfc,(ij) = Pfe^i [P^k,i + R-fcj) • (32) 

Analogous to Eq. (26), if we let n%"- = nl''nl, s = i + nf'lj — 1) [l < i < nf' 
and 1 < J < n^), and 









then we obtain 

p (xfcl Yfc) ^ ^ /3fc,,iV (xfc : x^,„ P^ J . (34) 

s=l 

Similar formulae are also obtained in [2,4]. 

4-3 Statistics estimation based on the posterior pdf 



The posterior pdf p(xfc|Yfc), given in Eq. (34), embodies all of the necessary 
statistical information. In particular, one may be interested in estimating the 
conditional mean x^ = E(xjt|Yfc) and covariance P^ = Cov(xfc|Yfc), which 
are given by [2, ch. 8] 






i^l = Y.PkAls. (35a) 

s=l 

p^ = E Pk,s (pL + {n - n,s) {n - n,sY) ■ (35b) 

s=l ^ ^ 

In principle the above computations can be done in parallel, using n^ in- 
dependent processor units, each of them adopting a reduced rank SUKF to 
assimilate a sub-system described by Eqs. (25) and (30). The final results are 
simply the weighted averages of the outputs of the individual processors. 
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5 An auxiliary algorithm 

One potential problem of the Gaussian sum filter (GSF) is that, the num- 
ber of Gaussian distributions in a GMM may grow very rapidly in certain 
circumstances. To see this, let the number of Gaussian distributions used to 
approximate the distributions of the background, the analysis, the dynamical 
noise and the observation noise at time k be nf!', nf"", n^ and n^ respectively. 
In the previous section we have shown that 



Therefore, if n^ > 1 or n^ > 1 at all times, n^^ and n^" will grow exponentially 
with time. This substantially increases the computational cost of the GSF. 

To reduce the computational cost, the authors in [1,33] suggested that "it 
is possible to combine many terms into a single term without seriously af- 
fecting the approximation ". In addition, some weights in the Gaussian sum 
approximation, i.e., some 7fc,s's in Eq. (26) and some f3k,s's in Eq. (34), may be 
sufficiently small compared to the others so that they can be simply neglected 
[1,33]. 

Another possible strategy is to conduct pdf re-approximation, i.e., one uses a 
new Gaussian mixture model, with the specified number of Gaussian distri- 
butions, to approximate the prior or the posterior pdf that has already been 
expressed in terms of a Gaussian sum approximation (for example, see [30]). 
To estimate the parameters of the new Gaussian mixture model (i.e., weights, 
means and covariances of individual Gaussian distributions), the author in [30] 
suggested to adopt the expectation-maximization (EM) algorithm. However, 
the EM algorithm is an iterative method, which may require many iterations 
for convergence. Thus, using the EM algorithm in high- dimensional systems 
might be computationally intensive. 

In this work we propose another method to reduce the computational cost, 
which is also based on the idea of pdf re-approximation. Our criterion for pdf 
re- approximation is that the mean and covariance of the new Gaussian mixture 
model match those of the original one. We note that, doing this may incur 
some information loss during pdf re- approximation, since in general there is 
no guarantee that the new GMM also preserves the higher order moments 
of the original one. However, the benefit of adopting this re-approximation 
scheme is that, if the SUT-GSF is implemented in parallel, then in principle 
the computational speed of the SUT-GSF will almost be the same as that of 
the reduced rank SUKF, as will be shown below. 

For illustration, let p (x) be the pdf of a random variable x, which is expressed 
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in terms of a Gaussian mixture model with n Gaussian distributions so that 

n 

p(x) = ^a,iV(x:/i,,Si) , (37) 

where a^ is the weight associated with the Gaussian distribution iV (x : /Xj, Sj) 
with mean /Zj and covariance I]j. Our objective is to approximate p (x) by 
another Gaussian mixture model p (x) with m Gaussian distributions {m < n), 
which reads 

m— 1 

p(x)= Y.b^N{^■.Z,,^,) , (38) 

where bi is the weight associated with the distribution A^ (x : Zi, $j). We want 
to choose proper values of b^, Z^ and $j so that the mean and covariance of 
p (x) match those oi p (x). According to Eq. (35), the mean and covariance of 
p(x), denoted by x and P respectively, are given by 

n 

X = ^ ai^i , (39a) 

1=1 

n 

P = 5] ai (S, + (/ii - x) (/i, - x)^) . (39b) 

Similarly, the mean x and covariance P of p (x) are given by 

m—l 

±=Y. bi^^ ' (40a) 

i=0 
m—l 

P=Y.b^ (*. + {2, - ±) (Z, - ±f) . (40b) 

Thus our objective is to balance Eqs. (39) and (40) such that 



X = X, 

P = P 



(41) 



To this end, we first perform a matrix factorization, such as SVD, to find a 
square root matrix 

S = [si,S2,---,Sp] (42) 

rp 

of P with p column vectors Sj (i = 1, ■ ■ ■ ,p), such that P = S (S) . From S, 



Si — C [Si, S2, ■ ■ ■ , Sg] , 



we construct two more square root matrices Si, and S2 as follows 

(43) 



S2 — [dSi,- ■ ■ ,dSq, Sg+1, ■ ■ ■ , S- 



where c is a coefficient in the interval [0, 1], d = (1 — c^Y^'^ is a coefficient 
complementary to c (for convenience, we will call d the "complementary coef- 
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ficient" hereafter), and q is an integer no larger than p (i.e., q < p) ^ . Then 
it is clear that 

Si(Si) +82(82) =S(S)^ = P. (44) 



In order to balance Eqs. (39) and (40), we let 

m—l 



X = y^ 6iZi = X , 

j=0 




m—l 


= Si (Si 


m—l rp 

)^ 6,$, = S2 (S2) . 





T 



(45a) 
(45b) 
(45c) 



s=0 



Comparison with Eq. (5) shows that Eqs. (45a) and (45b) can be solved based 
on the SUT, by treating Zj's as a set of sigma points and 6j's the associated 
weights. For example, by taking the scale factor a = 1 in the SUT, we generate 
a set of m = 2g + 1 "' sigma points Zi with respect to the quartet (1, rj, x. Si) 
with 



•0 


= X, 








■i - 


= x + 


c^/q^r]Si,i- 


= 1,2,- 


■■ ,Q, 


■i - 


= X — 


Cy/q + r]Si, i ■■ 


= g + i, 


g + 2. 



(46) 
,2g, 

where 77 is an adjustable parameter analogous to A in Eq. (3). Then in the 
spirit of Eq. (4), the associated weights foj's are given by 

bo 



(47) 
2{q + ri) 

In particular, rj = 1/2 means that 60 = bi for i = 1,2,--- ,2g so that all 
Gaussian distributions are equally weighted. 



^ One can choose q > p. The idea is to produce I sets of sigma points in the spirit 
of Eq. (46) with the same column vectors Sj's, each of which consists of qo sigma 
points so that qo < P and I x qQ > p, but with different coefficient q's (i = 1, • • • , ^) 

in Eq. (46). Moreover, the Cj's satisfy X^ c^ < 1, and the weights in Eq. (47) will 

i=l 

also have to be adjusted accordingly. 

^ Choosing m = 2q + 1 means that m can be an odd integer only. If one wants 
m be even, one can use the set of sigma points {Zi}-^^ (by excluding Zq) and 
the associated weights {6i},j=i for the pdf re-approximation, where Zi^s and 6j's are 
sigma points and the corresponding weights given by Eqs. (46) and (47), respectively. 
It can be shown that the weighted mean and covariance of the set {-2i}j=i' with 
{fejljli being the weights, also capture the mean x and the covariance P [21]. 



2q 

To solve Eq. (45c), we note that the weights bi's in Eq. (47) satisfy J2 bi = I- 
Thus a simple choice is to let $j be the same for alH = 0, ■ ■ ■ , 2g so that 

$, = $ = S2 {S^Y . (48) 

With Eqs. (46) - (48), our objective, in terms of Eq. (41), can thus be achieved. 
Eq. (48) indicates that S2 is just a square root matrix of $. From this point 
of view, the complementary coefficient d in S2 influences how the pdf re- 
approximation is conducted. To see this, we use the special case p = q in 
Eq. (43) for illustration. In this case, when d — )■ 0, $ — )■ and the Gaussian 
distributions A^ (x : Zi, $) [i = 0, ■ ■ ■ ,2q + 1) approach delta functions with 
point masses at -Ej's. Thus the GMM in Eq. (38) will approach a Monte Carlo 
approximation with 2j's being the samples. On the other hand, when d — > 1, 
c — )■ 0, hence all the Gaussian distributions A^ (x : Zi, $) (z = 0, ■ ■ ■ , 2g + 1) 
approach A^ f x : x, Pj, thus the GSF will approach the reduced rank SUKF. 

In the previous section we saw that, for a reduced rank SUKF, at the filter- 
ing step of each assimilation cycle we need to perform an SVD in order to 
produce sigma points. Therefore, for the SUT-GSF equipped with the aux- 
iliary algorithm, by letting the covariances of all Gaussian distributions in 
the re-approximated GMM be the same (cf. Eq. (48)), we can perform an 
SVD at the filtering step only once for both the purpose of generating sigma 
points for individual reduced rank SUKFs, and that of conducting pdf re- 
approximation. Therefore, if the SUT-GSF is implemented in parallel, the 
SUT-GSF can achieve almost the same computational speed as the reduced 
rank SUKF^ . The concrete implementation of the SUT-GSF will be described 
with more details in the next section. 

Remark: For the SUT-GSF, even if both n^ and n1 in Eq. (36) are equal to 
1 (thus the number of Gaussian distributions does not grow), we still suggest 
to implement the auxiliary algorithm at the filtering step of each assimilation 
cycle. The reasons are as follows: 

Firstly, the Gaussian sum filter may suffer from the outlier problem. For some 
individual Gaussian distributions TV (x : /Xj, I]j) in a Gaussian mixture model 
(GMM), the observation y may be too far way from the projections of the 
means yUj's onto the observation space, i.e., the distances \\y—'H{fj,i) II2 are large 
enough to make the weights of the Gaussian distributions A^ (x : /ij, I]j) neg- 
ligible compared to the other Gaussian distributions. In such circumstances. 



^ Compared with the reduced rank SUKF, visually there is only one extra operation, 
i.e., the evahiations of the mean x and covariance P in Eq. (39), in the SUT-GSF. 
However, the computational cost in executing this extra operation will be negligible 
(compared with the other operations) if the number of Gaussian distributions is not 
too large. 



19 



if the tiny weights are continually carried forward to subsequent assimilation 
cycles, the weights of individual Gaussian distributions might "collapse" just 
like the situation in the particle filter [6]. In this case, the weight of one par- 
ticular Gaussian distribution in the GMM is very close to 1 while the weights 
of the other Gaussian distributions are almost zero. Thus the Gaussian sum 
filter is effectively reduced to a nonlinear Kalman filter and may suffer from 
numerical problems as very tiny values are involved in computation. In such 
circumstances, the auxiliary algorithm is used to adjust the weights of the 
Gaussian distributions in the GMM by replacing the original Gaussian dis- 
tributions by new ones. Our experience shows that equipping the SUT-GSF 
with the auxiliary algorithm can efficiently improve the stability of the filter. 

Secondly, in general, the auxiliary algorithm can also help to decrease the 
computational cost of the SUT-GSF. To see this, note that for the SUT-GSF 
not equipped with the auxiliary algorithm, the covariances of all Gaussian 
distributions may not be the same. Therefore in order to produce sigma points 
for the reduced rank SUKFs, one may have to perform an SVD for each 
different covariance. In contrast, the SUT-GSF equipped with the auxiliary 
algorithm only needs one SVD to generate sigma points for each SUKF, since 
through pdf re-approximation, one can choose to let the covariance of each 
individual Gaussian distribution be the same. 



6 Outline of the procedures in the SUT-GSF with the auxiliary 
algorithm 



To avoid distraction, we will discuss the initialization of the SUT-GSF in 
§ 7.3.2. Here let us focus on the procedures after the SUT-GSF is initialized. 

Without loss of generality, we assume that at instant k — 1, the posterior pdf 
p(xfc_i|Yfc_i) is re-approximated by 

Moreover, for each Gaussian distribution N (xfc_i : Z^^i^s^ ^fc-i) (^ = 0, ■ ■ ■ , 2g), 
there is a set of 21^-1 + 1 sigma points \ X^_i if . , with the associated weights 

L ' J i=Q 

W^_ii > . The procedure in the next assimilation cycle is outlined as fol- 
lows: 

(1) Propagation step: 

• Given the pdf p (u^) ^ J2i=i ^ki^ (^fc : 0, Q,k,i) of the dynamical noise, 
divide the original dynamical system into n^ sub-systems described by 
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Eq. (25). 

'^fc-i i f forward through each sub- 

system, and evaluate the background mean x^ ^ and covariance P^ ^ (as 
well as cross and projection covariances) according to the formulae in 
§ 3.2.1, now with s = 1, ■ • ■ ,nf , nf = (2/fc_i + 1) x n^. 
Update the weights b^-i/s to 7^^^ = Cikj^k-i,h with s = i + (2/fc_i + 
1) X (j - 1), z = 1, ■ ■ ■ ,24_i + 1,'j = 1, ■ ■ ■ ,<. 
Form the prior pdf 



p(x,|Y,„i)^E7M^(xfc:xt,„Pt, 

(2) Filtering step: 

• Given the pdf p (v^) ^ Z^i^i Cik i^ {^k '■ 0, Rfc,j) of the observation noise, 
divide the original observation system into n^ sub-systems described by 
Eq. (30). Evaluate the Kalman gain K^ (s = 1, ■ ■ ■ , n^'^, n^" = rv^ x n^) 

for each Gaussian distribution N rx.k ^x^jjPfcj) (j = l^""" )''^fc*) in 
each sub-system. 

• With the incoming observation y^, update x^ j and P\j (j = 1, ■ ■ ■ , nf') 
to x^ g and P^ ^ (s = 1, ■ ■ ■ , n^") for each Gaussian distribution 



A'( 



Xfc : x^ ■, P^ ■) in each sub-system, according to Eq. (16); 



:J' k,jj 

Update the prior weights 7/fc,j's {i = 1, ■ ■ ■ yuf') to the posterior ones 
/3fe,s's (s = 1, ■ ■ ■ ,n^") according to Eq. (33). 
Form the posterior pdf 



^.a 



p(xfc|Yfc)^5:/3,,,iV(x,:x«,„P^,, 



(3) Re-approximation of p (x^l Y^) by a set of m = 2g + 1 Gaussian distribu- 
tions 

2g 

p (xfc|Yfc) = Y^ bk,sN (xfc : Zk,s, ^k) 

s=0 

based on the auxiliary algorithm. 

• Evaluate the mean x^ and covariance P^ of p(xfc|Yfc) according to 
Eq. (35); 

• Conduct truncated SVD on P^ to obtain an approximate square root 

^T = [Ck,lGk,l, ■ ■ ■ , 0'k,l^ek,lk] 

of P^, where ak/s and e^j's are eigenvalues and eigenvectors of P^ 
(i = 1, ■ ■ ■ , /fc), and the truncation number Ik is determined by the rule 
Eq. (18). For simplicity here we suppose that q < h- Then we construct 
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two more matrices as follows: 

where c is a coefficient in [0, 1], and d = (1 — c^y^"^ is the coefficient 

complementary to c. 

Generate 2g + 1 centers {^fc,s}slo °^ Gaussian distributions as sigma 

points with respect to the quartet (l, r;, x^, Sfc,i) (cf. Eq. (46)), where rj 

is a free parameter to be specified by the readers. Allocate the weights 

{bk,s}g'Lo according to Eq. (47). Note that Sfc_2 is a square root of the 

covariance matrix $fc. 

For each Gaussian distribution N (x^ : Z^^g, $fc) (s = 0, ■ ■ ■ , 2g), gener- 

rY^ j !• with respect to the quartet 

fa, A, Zk,s, Sfc,2) according to Eq. (8); Calculate the associated weights 

\W^A_ according to Eq. (9). 



7 An example: Assimilating a 40-dimensional system 



7.1 The dynamical system, the observer, and the measure of filter perfor- 
mance 



We choose the r-dimensional system model due to Lorenz and Emanuel [25,26] 
(LE98 model hereafter) as the testbed. The governing equations are given by 

-j^ = (xi+i - Xi_2) Xi_i -Xi + F,i = l,--- ,r. (49) 

The quadratic terms simulate advection, the linear term represents internal 
dissipation, and F acts as a constant external forcing term [25]. Also note that 
the variables Xj's are defined cyclically such that x_i = Xr-i, xo = Xr, and 

Xr+l = Xi. 

In this perfect model scenario, there is no dynamical noise (except for some 
discretization errors), but for convenience in using the established formulae 
in the previous sections, technically we can model the dynamical noise at 
an arbitrary assimilation cycle k, denoted by u^, by a Gaussian distribution 
A^ (ufc : 0, 0) with zero mean and zero covariance. 

In the observation system, we let the observer Tik be an identity operator 
unless otherwise stated. For an identity observer, given a system state x^ = 
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[xk,i, ■ ■ ■ , Xk,r]'^ at the k-th assimilation cycle, the observation is the realization 
of the following random process 

Yfc = ■Hfc(xfc) + Vfc =Xfc + Vfc, (50) 

where v^ follows the r- dimensional Gaussian distribution A^(vfc : 0, 1,.) with I,. 
being the rxr identity matrix. Note that, both the dynamical and observation 
noise are described by a single Gaussian distribution, thus the size of the 
Gaussian mixture model in the SUT-GSF does not grow. Nevertheless, we 
will still conduct pdf re-approximation in the SUT-GSF for the reasons given 
in § 5. 

In our experiments, we choose r = 40 and F = 8 so that the LE98 model 
will exhibit chaotic behaviour [25,26]. We use a fourth-order Runge-Kutta 
method to integrate (and discretize) the system from time to 50, with a 
constant integration step of 0.05 (so there are 1001 integration steps overall). 
The observations are made at each integration step unless otherwise stated. 

We adopt the time-averaged relative root mean square error (relative rmse for 
short) to measure the performance of the filter, which is defined as 



tS 



Ut-Kh/Kh, (51) 

A:=0 



where kmax is the maximum integration step {kmax = 1000 for our experi- 
ments), x^'' denotes the truth (the state of a control run) at the k-th cycle, 
and II •II 2 means the 2-norm. 



7.2 Implementation issues 



Before presenting the numerical results, we would like to discuss the configu- 
ration issues of the SUT-GSF. 



7.2.1 Positive semi-definiteness of the covariance m^atrices in the reduced 
rank SUKF 

One important issue in implementing the SUT-GSF is to guarantee the posi- 
tive semi-definiteness of the covariance matrices in the reduced rank SUKF. 

To this end, first of all we require Ik + X > so that the square root of /^ + A 
in Eq. (8) is real. Also note, when computing the covariances in § 3.2.1, the 
effective weight of x^_,_j^q is Wk^ + 1 + /3 — a^ (/3 > 0). So we also require 
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Wkfl + 1 + /3 — a^ > 0, which, together with Eq. (9), imphes that 

I ,,/ ,, +1-^1 +l + /3-a'>0. (52) 

Ik may take different values at different assimilation cycles. However, since Ik 
is set to be bounded such that < /; < /^ < 4 (cf. § 3.2.2), with some algebra, 
one can obtain the sufficient conditions 

A > -L ^' 



(2 + /3)" 



a > 



a < 



\ 



\ 




(53) 



2 + /3 + J(2 + /3)' ^' 



that guarantee the positive semi-definiteness. 

7.2.2 The choice of the threshold F^ in the reduced rank SUKF 

The choice of the threshold F^ in § 3.2.2 (to determine the truncation number 
/fc, hence the number of sigma points) follows the procedure in [28]. At the 
first assimilation cycle we specify a threshold Fq. If Fq is a proper value such 
that the corresponding truncation number Zq satisfies k < Iq < lui then we 
keep Fq and at the next cycle we start with Fi = Fq. If Fq is too small such 
that /o < k, then we increase it gradually by replacing Fq with I.IFq + 200 
^ . We continue the replacement until Iq falls into the specified range, or the 
number of the replacement operations is up to 30 (in which case we simply 
put Iq = li, regardless of what Fq is). Similarly, if Fq is too large such that 
Iq > lu, then we decrease it gradually by replacing Fq by Fq/I.I — 200. We 
continue the replacement until /q falls in the specified range, or the number 
of the operations is up to 30 (in which case we simply put /q = 4)- After the 
adjustment, at the next cycle we start with Fi = Fq and adjust it (if necessary) 
to let h fall into the specified range, and so on. 

7.2.3 Covariance inflation and filtering 

In order to improve the filter performance, we introduce two extra techniques, 
called covariance inflation and covariance filtering, to the reduced rank SUKF 
(hence the SUT-GSF). 



This is an ad lioc choice. Other choices shall also be acceptable. 
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The main idea of covariance inflation is to increase either the background or 
the analysis covariance at each assimilation cycle by a constant factor, which 
proves to be a simple but very useful technique in improving the performance 
of the EnKF, such as robustness against divergence, and accuracy. For exam- 
ples, see [4,29,35]. The initial motivation to introduce covariance inflation to 
the EnKF is that, the error covariance of the EnKF will be systematically un- 
derestimated due to the effect of small ensemble size [35] (but for an ensemble 
size larger than 10, this effect might be negligible). We note that, the EnKF 
with covariance inflation used in those works is similar to the Kalman-fllter 
with fading memory (KF-FM) [32], [36, ch. 15], which might better explain 
the success of the covariance inflation technique. In the EnKF, by conducting 
covariance inflation on the background error covariance P^ at time k ^ , one 
in effect increases the relative weight of the incoming observation y^ when up- 
dating the background x^ to the analysis x^ according to Eq. (16a). Note that, 
the background x| itself contains historical information contents before time k 
(for example, the initial condition Xq and past observations {y^, i < k}). Thus 
if one conducts covariance inflation at each assimilation cycle, the weights of 
historical information contents in affecting the behaviour of the EnKF will 
decrease exponentially. This is often desired because a fllter may be subject to 
various sources of errors, for example, the error in choosing an initial condi- 
tion, occasional outliers in observations, and the sub-optimality of a fllter used 
for data assimilation. In such circumstances, an incoming observation might 
often be more reliable than the background. Hence it is rational to give the 
incoming observation more weight to update the background. In this way, the 
fllter will become more robust (against divergence) and often more accurate. 

Since the SUKF also uses Eq. (16a) to update the background to the analysis, 
it is natural to adopt covariance inflation in the SUKF (hence the SUT-GSF). 
In this work, we follow the method used in [4,35] and choose to multiply the 
analysis error covariance P^ ^ of each reduced rank SUKF in the SUT-GSF by 
a factor (1+6)^. Thus, after we update the error covariance to P^ ^ in a reduced 
rank SUKF, we replace it by (1 + S)'^ P^ ^ and use the inflated covariance for 
the subsequent computations. However, how to choose the optimal value of 
the covariance inflation factor 6 (to minimize the relative rmse in Eq. (51)) is a 
complicated problem [3]. In our opinion, one important factor that influences 
the optimal value of 6 is the relative reliability of the background and the 
observation. Thus in general, the optimal value of 6 might appear different in 
different contexts. 

The other technique, covariance flltering [17,19], is proposed to tackle the ef- 
fect of small ensemble size on the error covariances (e.g., the background error 
covariance, the projection covariance, and the cross covariance etc.). Because 



^ Increasing the analysis error covariance will increase the background error covari- 
ance at the next assimilation cycle. 
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of the typically small ensemble size in the EnKF, spuriously large correlations 
between distant locations may appear in the error covariances [17]. To ad- 
dress this problem, one may introduce a distance-dependent tapering function 
to the error covariances such that the correlation between two points in the 
state space will decrease to zero as their distance grows. The decreasing rate of 
the correlation is controlled by the type of the tapering function, and a "length 
scale" parameter, denoted by Ic in this paper. Note that, even with the covari- 
ance filtering technique, we expect that the effect of small ensemble size on 
the error covariances cannot be completely removed. However, our experience 
shows that conducting covariance filtering may make the filter more robust 
than not doing so. Thus in this paper, we will choose to conduct covariance 
filtering on the background error covariance, the projection covariance, and 
the cross covariance at each assimilation cycle of each reduced rank SUKF in 
the SUT-GSF. We follow the method in [19] to conduct covariance filtering, 
with the tapering function being the fifth order function in Eq. (4.10) of [14]. 

Note that, the distance in covariance filtering used in this work is different from 
those used in real applications, where the distance dij is normally defined as a 
function of the distance between the locations i and j in the three dimensional 
physical world (see, for example, [9]). But for the L96 system (of dimension 
40), this definition is not applicable. This is because the states of the L96 
system do not have any physical meaning, so that we cannot observe them in 
the physical world. Similarly, the physical distance between the i-th and j-th 
elements (i, j = 1, ■ ■ • , 40) of a model state x = (xi, ■ ■ ■ , 0:40)^ is also not well 
defined. 

For the above reasons, we define the distance dij between the i-th and j-th 
elements of a random variable x in the following way. Suppose that A is a 
covariance-type matrix of x with m rows (and the number of its rows is not 
less than the number of its columns), so that 



T T 

V • • • V 

^ 1 ) ; ^ m 



A 

where r,- is the i-th row of A. We define 



(54) 



dij = \\rj -rjy/lc, (55) 

where Ic is the length scale ^ . Our experience shows that covariance filtering 
conducted in this way can achieve the same effect as that obtained by using the 



^ Note that we have assumed that the number of the rows of a matrix (not neces- 
sarily square) is not less than the number of its columns. If this is not the case, then 
it is suggested to choose the column vectors to calculate the distances dij in Eq. (55) 
instead. In this way, covariance filtering can be applied to non-square matrices like 
the cross covariance (when the dimension of the state space is not equal to that of 
the observation space). 
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physical distances between different locations to construct the taper matrix 
[27, § 3.3.3.2]. 

In our experiments, covariance filtering will be conducted on the background 
covariance, the cross covariance, and the projection covariance (cf. Eqs. (lib), 
(12b) and (12c)) of each individual reduced rank SUKF. 



1.3 Numerical experiments and results 



There are various parameters in the SUT-GSF. These include the intrinsic 
parameters in the reduced rank SUKF, for example, the parameters a, /3 and 
A ect. (cf. § 3.2), and the parameters in the SUT-GSF for the purpose of 
pdf re-approximation, for example, the number m of Gaussian distributions 
in the re-approximated Gaussian mixture model (GMM), the complementary 
coefficient d and the parameter rj etc. (cf. § 5). 

In what follows we will conduct two experiments to examine the effects of some 
of the above parameters. In the first experiment, we will examine the effects of 
the inflation factor 6 and the length scale Ic on the performance of the reduced 
rank SUKF. The effects of the other intrinsic parameters of the reduced rank 
SUKF were reported in [28]. Hence we will flx them in the experiment. As 
the SUT-GSF consists of parallel reduced rank SUKFs, we expect that 6 and 
Ic would influence the performance of the SUT-GSF in the same way as they 
influence the reduced rank SUKF. In the second experiment, we will examine 
the effects of the number m of Gaussian distributions and the complementary 
coefficient d on the performance of the SUT-GSF. We will always set rj = 1/2 
in our experiments so that all the Gaussian distributions in the GMM have 
the same weight. 



7. 3. 1 Effects of the inflation factor 5 and the length scale Ic on the perfor- 
mance of the reduced rank SUKF 

Because there are no general rules on how to choose the optimal values of 5 
and /c, we choose to examine their effects on the performance of the reduced 
rank SUKF within certain ranges, which can be used as the empirical guide 
for the choice of 5 and Ic later on. 

In our experiments the size of the GMM will not grow. Thus by letting the 
number of Gaussian distributions equal 1, the SUT-GSF is equivalent to the 
reduced rank SUKF and there is no need to conduct pdf re-approximation (as 
the re-approximated pdf will always be equal to the original one). For this 
reason, here we specify the intrinsic parameters of the reduced rank SUKF 
(cf. § 3.2), which are set as follows. We let a = 1, /3 = 2, A = —2, the initial 
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threshold Fq = 1000, the lower bound k = 3 and the upper bound /„ = 6. 
We increase the inflation factor 6 from to 10, with a fixed increment of 0.5 
each time. For notational convenience, we denote by : 0.5 : 10 the values of 6 
chosen in this way. We also vary the length scale Ic from 10 to 400, with a fixed 
increment of 20 each time. Thus the values of Ic are denoted by 10 : 20 : 400 in 
a similar way. The initialization of the reduced rank SUKF follows the same 
procedures in the SUT-GSF (as the reduced rank SUKF can be treated as a 
special case of the SUT-GSF). But note that, here we conduct the experiment 
only once, rather than repeat it for a number of times as the subsequent 
experiment does. We feel it shall be sufficient for our purpose to give a sketch 
of the dependence of the relative rmse (of the reduced rank SUKF) on 6 and 
I 9 

In Fig. 1 we plot the relative rmse of the reduced rank SUKF as a function 
of the inflation factor S and the length scale Ic- As one can see, when fixing 
the length scale Ic, the relative rmse of the SUKF exhibits a U-turn behaviour 
as the infiation factor 6 increases: when 6 increases from 0, the relative rmse 
tends to decrease at the beginning. For 6 sufficiently large, however, increas- 
ing the value of 6 further will instead cause a larger relative rmse. The U-turn 
phenomenon can be explained as follows. When there is no covariance infla- 
tion (6 = 0), it can be shown that the error covariance of the reduced rank 
SUKF is systematically underestimated, similar to the arguments in [35]. This 
means that we are over-confldent about the accuracy of the background. Con- 
sequently, the analysis to be updated will rely too much on the background, 
which itself may not be very accurate due to various error sources (e.g., the ef- 
fect of small ensemble size, the sub-optimality of the fllter). On the other hand, 
increasing 6 will lead to larger background error covariance, which means we 
become more uncertain about the background. Thus if 6 gets too large, the 
analysis to be updated will rely too much on the incoming observation, which 
may also cause relatively large relative rmse as the information contents from 
our prior knowledge (the background) will possibly be underrepresented. In 
contrast, a "moderate" inflation factor 5, as a trade-off between being too 
large and too small, will instead reduce the relative rmse. 

On the other hand, when flxing the inflation factor 6 in Fig. 1, if 5 is not 
large (say, 6 < 2), then the relative rmse appears insensitive to the change 
of Ic] a 2 < 6 < 4, the relative rmse also exhibits the U-turn behaviour as 
Ic increases; but if 5 > 4, overall the relative rmse of the SUKF tends to 
decrease as Ic increases. The U-turn behaviour can be explained from the 
following point of view ^^ : by conducting covariance filtering, one increases 



^ This is also supported by the numerical results in Fig. 3, where the standard 
deviation in the case m = 1 is quite smah. 

^^ The authors would hke to particularly thank one of the anonymous reviewers for 
providing us this explanation. 
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the effective ensemble size. The smaller the value of Ic, the more obvious the 
effect of covariance filtering. However, a too small Ic may substantially distort 
the original dynamics of the system, and thus deteriorate the performance of 
the filter. So here again it is a "moderate" value of l^ that achieves a better 
performance. 



7.3.2 Effects of the number of Gaussian distributions and the complementary 
coefficient on the performance of the SUT-GSF 

Now we proceed to examine the effects of the number of Gaussian distributions 
and the complementary coefficient on the performance of the SUT-GSF. Here 
we consider two scenarios. 

In the first scenario, we consider an ideal situation, where the observation 
system observes the full elements in a state vector, and records the observa- 
tions at every integration step. The parameters in the SUT-GSF are chosen 
as follows. For each reduced rank SUKF in the SUT-GSF we let a = 1, /9 = 2, 
A = —2, the initial threshold Fq = 1000, the lower bound k = 10, the upper 
bound lu = 10, the inflation factor 6 = 6, the length scale Ic = 240. We let 
the number of Gaussian distributions in the original GMM increase from 1 to 
11, with a fixed increment 2 each time. Although in the experiments the size 
of the GMM will not grow, we still choose to conduct pdf re- approximation. 
For this purpose, we fix 77 = 1/2 so that all the Gaussian distributions in the 
re-approximated GMM are equally weighted. We let the number m = 2g + l of 
Gaussian distributions in the re-approximated GMM equal that of the original 
GMM, i.e., m = l:2:ll(g = 0:l:5), and we vary the complementary 
coefficient d so that it takes the values of 0.05 : 0.1 : 0.95. 

To start the assimilation, we randomly choose an initial condition for a control 
run, and so obtain the true trajectory within the specified assimilation window. 
We then add some Gaussian noise drawn from the distribution A^ (vfc : 0, I40) 
to the true trajectory to generate the observations. The noise level (relative 
rmse) of the observations 6°''^ ~ 0.22. We also generate 10 randomly perturbed 
initial conditions as the background ensemble at the first assimilation cycle. 
We use the background ensemble to initialize the prior pdf of system states 
in terms of a GMM (cf. Eq. (20)). To this end, one may use the auxiliary al- 
gorithm in § 5 to capture the sample mean and covariance of the background 
ensemble. The weights, means, and square roots of covariance matrices of in- 
dividual Gaussian distributions in the initial prior pdf can thus be determined 
in the same way as that described in § 5. Thus, in effect we allocate all compo- 
nents in the initial GMM an equal covariance. This might not be the optimal 
choice, but it is a relatively simple strategy for implementation. In a long-term 
run, the impact of the choice of the initial GMM may fade away as time moves 
forward, especially in the presence of the covariance inflation technique (see 
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the discussion in § 7.2.3). Similar argument can also be found in [18]. 

With the above information, sigma points and their associated weights for 
each Gaussian distribution can also be generated. In order to reduce the effect 
of statistical fluctuations, we repeat the experiments 20 times, each time with 
a new randomly generated background ensemble, while all the other settings, 
including the initial condition and the values of m and d, remaining unchanged. 

Fig. 2 shows the relative rms errors (averaged over 20 experiments) of the SUT- 
GSFs as functions of the complementary coefficient d. As one can see, when 
the number m of Gaussian distributions is relatively small, say m = 1,3,5, 
the relative rmse errors does not change significantly as d increases from 0.05 
to 0.95. In particular, when m = 1, the SUT-GSF is equivalent to the reduced 
rank SUKF, where the complementary coefficient d does not affect the values 
of the relative rmse (as pdf re- approximation does not take effect in this case). 
In contrast, when m is relatively large, say m = 7,9,11, the relative rmse 
exhibits a different behaviour as d increases. The relative rmse with a small 
complementary coefficient (e.g. d = 0.05) is much higher than that with a 
large complementary coefficient (e.g. d = 0.95). Note also that, when d = 0.95 
(close to 1), all the SUT-GSFs approach the reduced rank SUKF, as we have 
pointed out in § 5. Hence in this case their relative rms errors are all close to 
that of the reduced rank SUKF. 

In Fig. 3 we show the standard deviations of the relative rms errors of the SUT- 
GSFs in Fig. 2, as functions of d. As one can see there, the standard deviations 
behave like the relative rms errors in Fig. 2. For small m (say m = 1,3), the 
standard deviations are very small and do not change significantly with d. 
But for a relatively large m (say m > 5), when d is small (e.g. d = 0.05), 
the standard deviations may appear quite large. Again, as d approaches 1, the 
standard deviations of all the SUT-GSFs approach that of the reduced rank 
SUKF. 

The under-performance of the SUT-GSF with a relatively large m but small 
d may have a connection with the slow convergence rate of the Monte Carlo 
approximation. When d is small, the GMM approaches the Monte Carlo ap- 
proximation, which converges at a rate of 1/y/m. As the relatively large num- 
bers m (say ??7, = 11) used in our experiments are typically very small for the 
purpose of convergence, it thus leads to relatively large estimation errors and 
standard deviations. On the other hand, the phenomenon that when d is small 
(say d = 0.05), the SUT-GSF with a small m (say m = 3) will perform better 
than the SUT-GSF with a relatively large m (say m = 11), is less understood. 
A possible explanation might be that, when m is small, the SUT-GSF is close 
to the reduced rank SUKF, which implicitly assumes that system states fol- 
low a Gaussian distribution. Although the Gaussianity assumption may not 
be true, it still works better than the Monte Carlo approximation with such 
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a small number of samples. 

Since in Fig. 2, with the other conditions being the same, a larger number 
m of Gaussian distributions does not necessarily guarantee a lower relative 
rmse, we need to adopt a different measure to see the benefit of using a larger 
number m of Gaussian distributions in the SUT-GSFs. To this end, note that 
in the context of our experiments, the relative rmse e^ is a function of m and 
d. Thus we define a new measure e™"(?7i) as follows 

e™"(?7i) = argmin er(m, d) . (56) 

d 

In the above equation, ej"'"(m) means the minimum value of er{m,d) within 
the range of d tested for a given m. For this reason, we will call e'^^"{m) the 
"minimum relative rmse" . 

In Fig. 4 we plot the minimum relative rmse e™" of the SUT-GSF as a function 
of the number m of Gaussian distributions. As one can see, the minimum 
relative rmse monotonically decreases as m increases from 1 to 11. Thus a 
larger number of Gaussian distributions can benefit the performance of the 
SUT-GSF in the sense that it can achieve a lower minimum relative rmse. 

In the second scenario, we consider a situation closer to that in real ap- 
plications. We let the observation system observe only odd-order elements 
{xi,X3, ■ ■ ■ , X39} in a state vector x = (xi, 2:2, ■ ■ ■ , 3:40)'^, and record the obser- 
vations for every 10 integration steps. The covariance matrix of the observation 
noise now becomes A^ (v^ : 0, l2o)- In the absence of observations, no update 
of the background will be conducted. We simply propagate the background 
forward to the next assimilation cycle, without changing the weights of sigma 
points and the weights of individual components in the GMM. The intrinsic 
parameters of the SUT-GSF are set as follows. The lower bound /; = 10, the 
upper bound /„ = 20, the inflation factor 6 = 0, the length scale Ic = 400, 
and the initial background ensemble size n = 20. The other unmentioned 
parameters take the same values as those in the first scenario. We also re- 
peat the experiments for 20 times, each time with a new randomly generated 
background ensemble, while keeping all the other settings unchanged. 

Fig. 5 shows the relative rms errors as functions of the complementary coeffi- 
cient d in the second scenario. Here, because less observations are available in 
assimilation, the obtained relative rms errors are larger than those in Fig. 2. 
Nevertheless, the behaviour of the SUT-GSF in this case is similar to that 
in the first scenario. For example, when the complementary coefficient d is 
small (say d = 0.05), a smaller size GMM (say m = 1,3) performs better 
than a larger size one; the performances of the GMMs tend to converge as 
d approaches 1. Because of the reduction of available information from the 
observations, in general the associated standard deviations of the relative rms 
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errors in Fig. 6 are larger than those in Fig. 3, as one may expect. Finally, 
Fig. 7 exhibits a clear similarity to Fig. 4, in the sense that the minimum 
relative rms errors in both figures monotonically decrease as the number of 
Gaussian distributions grows. 



8 Conclusion 



In this paper we introduced a new filter, called the scaled unscented trans- 
form Gaussian sum filter (SUT-GSF), to assimilate nonlinear/non-Gaussian 
systems. To set up the framework of the SUT-GSF, we first presented the 
reduced rank scaled unscented Kalman filter (SUKF) for high dimensional 
nonlinear/Gaussian systems. Then, we introduced the idea of Gaussian sum 
filter (GSF) from the point of view of recursive Bayesian estimation (RBE). 
Combining the reduced rank SUKF and the GSF will lead to the SUT-GSF, 
which essentially consists of a set of parallel reduced rank SUKF. To reduce 
the computational cost of the SUT-GSF, we also introduced an auxiliary algo- 
rithm to conduct pdf re-approximation, which almost does not infiuence the 
computational speed of the filter if the SUT-GSF is implemented in paral- 
lel. As an example, we used the 40-dimensional LE98 model to illustrate the 
details in implementing the SUT-GSF, and to examine the effects of various 
filter parameters on the performance of the SUT-GSF. Numerical results of 
our experiments showed that, a larger number of Gaussian distributions ben- 
efited the performance of the SUT-GSF in the sense that it achieved a lower 
minimum relative rmse. 
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Fig. 1 . The relative rmse of the reduced rank SUKF as a function of the inflation 
factor 5 and the length scale Ic- 
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Fig. 2. The relative rms errors of the SUT-GSFs as functions of the complemen- 
tary coefficient d. Here we present the SUT-GSFs with the numbers of Gaussian 
distributions m = 1 : 2 : 11, in the first scenario. 
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Fig. 3. Standard deviations associated with the relative rms errors of the SUT-GSFs 
in Fig. 2, as functions of the complementary coefficient d, in the first scenario. 
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Fig. 4. (Local) minimum of the relative rms errors in Fig. 2, as a function of the 
number m of Gaussian distributions, in the first scenario. 
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Fig. 5. The relative rms errors of the SUT-GSFs as functions of the complementary 
coefficient d, in the second scenario. 
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Fig. 6. Standard deviations associated with the relative rms errors of the SUT-GSFs 
in Fig. 5, as functions of the complementary coefficient d, in the second scenario. 
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Fig. 7. (Local) minimum of the relative rms errors in Fig. 5, as a function of the 
number m of Gaussian distributions, in the second scenario. 
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